162

Applications in Computer Vision

Based on our assumption, for wi we formulate the ideal bimodal distribution as

P(wi|Θi) = βk

i

2



k=1

p(wi|Θk

i ),

(6.49)

where the number of distributions is set as 2 in this paper. Θl

k = {μk

i , σk

i } denotes the

parameters of the k-th distribution, i.e., μk

i denotes the mean value and σk

i denotes the

variance, respectively.

To solve the GMM with the observed data wi, i.e., the weight ensemble in the i-th

layer. We introduce the hidden variable ξjk

i

to formulate the maximum likelihood estimation

(MLE) of GMM as

ξjk

i

=



1,

wj

i pk

i

0,

else

,

(6.50)

where ξjk

i

is the hidden variable that describes the affiliation of wj

i and pk

i (simplified deno-

tation of p(wi|Θk

i )). We then define the likelihood function P(wj

i , ξjk

i |Θk

i ) as

P(wj

i , ξjk

i |Θk

i ) =

2

!

k=1

(βk

i )|pk

i |

mi

!

j=1

 1

Ωf(wj

i , μk

i , σk

i )

ξjk

i

,

(6.51)

where Ω=

2π|σk

i |, |pk

i |=mi

j=1 ξjk

i , and mi =2

k=1 |pk

i |. And f(wj

i , μk

i , σk

i ) is defined as

f(wj

i , μk

i , σk

i ) = exp(1

2σk

i

(wj

i μk

i )2).

(6.52)

Hence, for every single weight wj

i , ξjk

i

can be computed by maximizing the likelihood as

max

ξjk

i ,j,k

E



log P(wj

i , ξjk

i |Θk

i )|wj

i , Θk

i



(6.53)

where E(·) represents the estimate. Therefore, the maximum likelihood estimate ˆξjk

i

is

calculated as

ˆξjk

i

=E(ξjk

i |wj

i , Θk

i )

=P(ξjk

i

= 1|wj

i , Θk

i )

=

βk

i p(wj

i |Θk

i )

2

k=1 βk

i p(wj

i |Θk

i )

.

(6.54)

After the expectation step, we perform the maximization step to compute Θk

i as

ˆμk

i =

mi

j=1 ˆξjk

i wj

i

mi

j=1 ˆξjk

i

,

(6.55)

ˆσk

i =

mi

j=1 ˆξjk

i (wj

i ˆμk

i )2

mi

j=1 ˆξjk

i

,

(6.56)

ˆαk

i =

mi

j=1 ˆξjk

i

mi

.

(6.57)